Item Sets that Compress
نویسندگان
چکیده
One of the major problems in frequent item set mining is the explosion of the number of results: it is difficult to find the most interesting frequent item sets. The cause of this explosion is that large sets of frequent item sets describe essentially the same set of transactions. In this paper we approach this problem using the MDL principle: the best set of frequent item sets is that set that compresses the database best. We introduce four heuristic algorithms for this task, and the experiments show that these algorithms give a dramatic reduction in the number of frequent item sets. Moreover, we show how our approach can be used to determine the best value for the min-sup threshold.
منابع مشابه
Compression Picks The Significant Item Sets
Finding a comprehensive set of patterns that truly captures the characteristics of a database is a complicated matter. Frequent item set mining attempts this, but low support levels often result in exorbitant amounts of item sets. Recently we showed that by using MDL we are able to select a small number of item sets that compress the data well [15]. Here we show that this small set is a good ap...
متن کاملSimilarity Data Item Set Approach: An Encoded Temporal Data Base Technique
Data mining has been widely recognized as a powerful tool to explore added value from large-scale databases. Finding frequent item sets in databases is a crucial in data mining process of extracting association rules. Many algorithms were developed to find the frequent item sets. This paper presents a summary and a comparative study of the available FP-growth algorithm variations produced for m...
متن کاملCT-ITL : Efficient Frequent Item Set Mining Using a Compressed Prefix Tree with Pattern Growth
Discovering association rules that identify relationships among sets of items is an important problem in data mining. Finding frequent item sets is computationally the most expensive step in association rule discovery and therefore it has attracted significant research attention. In this paper, we present a more efficient algorithm for mining complete sets of frequent item sets. In designing ou...
متن کاملFrequent Patterns that Compress
One of the major problems in frequent pattern mining is the explosion of the number of results, making it difficult to identify the interesting frequent patterns. In a recent paper [14] we have shown that an MDL-based approach gives a dramatic reduction of the number of frequent item sets to consider. Here we show that MDL gives similarly good reductions for frequent patterns on other types of ...
متن کاملThe Distinction of Hot Herbal Compress, Hot Compress, and Topical Diclofenac as Myofascial Pain Syndrome Treatment
This randomized controlled trial aimed to investigate the distinctness after treatment among hot herbal compress, hot compress, and topical diclofenac. The registrants were equally divided into groups and received the different treatments including hot herbal compress, hot compress, and topical diclofenac group, which served as the control group. After treatment courses, Visual Analog Scale and...
متن کامل